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EXECUTIVE  SUMMARY 


This  report  presents  methodology  for  the  near  optimal  selection  of  chemical  sensors  in  a  chemical  sensing 
array.  While  the  sensing  criteria  arc  task  specific,  generally  one  may  consider  a  criterion  which  maximizes 
the  signal  strength  or  conversely  minimizes  global  error  to  be  best.  The  quantification  of  this  criteria  pro¬ 
ceeds  from  the  determinant  of  the  inverse  Fisher  information  matrix  which  is  proportional  to  the  global  error 
volume.  If  a  practitioner  has  a  suitable  probabilistic  noise  model  for  his  or  her  chemical  sensing  array  and 
pool  of  available  sensors,  the  Fisher  information  matrix  may  be  parametrized  to  select  the  best  sensors  after 
an  optimization  procedure.  Due  to  the  positive  definite  nature  of  the  Fisher  information  matrix,  convex  op¬ 
timization  may  be  used  to  accomplish  this  task.  This  report  presents  the  derivation  of  the  supporting  set-up, 
expressions,  and  constraints  for  this  procedure. 


E-l 


USING  FISHER  INFORMATION  CRITERIA  FOR  CHEMICAL  SENSOR  SELECTION 

VIA  CONVEX  OPTIMIZATION  METHODS 

1.  BACKGROUND  AND  OVERVIEW 

The  design  of  chemical  sensor  arrays  from  the  standpoint  of  chemical  sensor  selection  and  error  quan¬ 
tification  has  historically  proceeded  as  an  ad  hoc  process.  Frequently,  chemical  sensors  are  developed  not 
as  general  puipose  sensing  devices,  but  as  analyte  or  chemical  class  specific  detectors.  When  such  single 
purpose  devices  are  integrated  together  as  a  chemical  sensor  array,  it  is  unclear  a  priori  how  well  they  will 
function  in  concert  with  each  other  to  provide  expanded  capabilities,  an  observation  that  is  true  of  the  in¬ 
tegration  of  analytical  instruments  as  well  [1],  Further  complicating  the  combination  and  optimization  of 
these  devices  is  that  it  is  semantically  unclear  precisely  what  the  combined  device  or  array  ought  to  do. 
Defining  what  a  combined  sensing  device  ought  to  do  is  difficult  and  highly  dependent  upon  the  analytical 
task  the  array  will  be  intended  to  support  as  well  as  the  specific  goals  of  the  array  designer. 

In  the  face  of  an  otherwise  unspecified  sensing  task,  it  is  reasonable  to  assume  that  the  practitioner  ought 
to  attempt  to  minimize  the  global  error  of  the  array,  or  conversely,  to  maximize  the  signal.  This  is  the  ap¬ 
proach  is  taken  by  the  authors  within  this  report.  The  question  remains,  however,  as  to  how  to  best  fulfill 
this  objective.  While  a  hypothetical  practitioner  may  be  able  to  take  an  exhaustive  approach  to  sensor  array 
design  by  experimentally  evaluating  all  possible  sensor  combinations,  this  method  quickly  becomes  infeasi¬ 
ble  as  the  number  of  sensors  relative  to  array  slots  becomes  coequal  or  large.  In  the  rare  cases  when  a  sensor 
array  optimization  has  been  attempted  (as  opposed  to  using  whatever  sensors  were  immediately  available),  it 
is  this  aforementioned  approach  of  combinatorial  experimentation  which  historically  has  typified  chemical 
sensor  array  design,  and  thus,  severely  limited  the  optimization  of  sensor  arrays.  Alternative  approaches 
to  array  design  based  on  neural  networks  and  machine  learning  have  also  been  tried  e.g.  [1-4].  However, 
due  to  their  opacity,  these  methods  fail  to  provide  significant  insight  into  the  chemical  detection  problem 
or  to  suggest  subsequent  ways  to  further  improve  the  array  design.  Consequently,  an  explicit,  precise,  and 
mathematically  rigorous  approach  to  chemical  sensor  array  design  and  optimization  is  greatly  desired. 

Given  its  wide  range  of  applications  it  is  surprising  that  the  literature  centered  on  chemical  sensor  array 
optimization  strategies  is  rather  sparse,  despite  the  relative  frequency  of  reports  describing  specific  sensor 
arrays  and  applications.  A  notable  exception  is  the  Fisher  information  matrix-based  approach  proposed  by 
Pearce  and  Sanchez-Montaes  and  theoretically  applied  to  simple  linear  sensor  systems  with  uncorrelated 
noise  [5-7].  Unfortunately,  this  methodology  has  not  been  greatly  developed  since  its  inaugural  set  of 
papers.  In  the  view  of  this  reports  authors,  this  is  most  likely  due  to  the  mathematical  complexities  and 
difficulties  presented  by  implementing  this  program  as  well  as  the  accompanying  change  in  mentality  this 
forces  upon  the  typical  practitioner  in  the  chemical  sensing  field. 

This  report  further  develops  the  use  of  the  Fisher  information  matrix  as  a  quantitative  descriptor  for 
hypothetical  chemical  sensor  array  scenarios  in  which  a  collection  of  co-located  sensors  respond  to  chemical 
mixtures  resulting  from  a  pool  of  possible  analytes.  It  assumes  that  the  underlying  sensors  provide  additive 
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lineal-  responses  with  respect  to  the  system  of  analytes  and  that  they  may  exhibit  statistically  correlated 
noise.  The  latter  is  important  as  correlated  measurement  error  is  realistic,  yet  frequently  unacknowledged  in 
the  literature.  The  former  is  generally  a  reasonable  assumption  in  low  concentration  regimes,  which  typify 
the  bulk  of  analytical  sensing  applications,  and  present  the  greatest  challenges  regarding  desired  sensitivity 
and  selectivity. 

This  work  describes  how  the  positive  (semi)definite  nature  of  the  Fisher  information  matrix  enables 
algorithmic  chemical  sensor  array  design  via  convex  optimization  techniques.  This  property  is  a  rare  bit 
of  mathematical  good  fortune  as  the  general  case  of  global  optimization  is  generally  computationally  in¬ 
tractable.  The  use  of  elliptically  contoured  distributions  as  a  general-purpose  means  of  modeling  correlated 
sensor  noise  is  introduced  and  developed  for  convex  optimization  of  sensor  arrays.  Ultimately,  this  report 
presents  a  theoretical  summary  description  of  this  approach  to  chemical  sensor  array  design  and  optimization 
by  showing  how  to  (nearly)  best  select  a  subset  of  sensors  for  a  sensor  array  from  a  much  larger  collection 
while  assuming  correlated  noise  and  the  specific  challenges  of  a  chemical  environment.  This  effort  was 
conducted  in  support  of  a  basic  research  program  seeking  to  develop  better  methodology  for  the  design  of 
chemical  sensor  arrays  using  techniques  from  information  theory. 


2.  FISHER  INFORMATION  IN  CHEMICAL  SENSOR  ARRAY  DESIGN 
2.1  General  Applicability  of  Fisher  Information  to  Sensor  Systems 

Both  Fisher  information  (FI)  and  its  generalization  to  multi-parameter  estimation,  the  Fisher  information 
matrix  (FIM),  are  relevant  to  the  design  of  statistical  estimators  (i.e.  sensors)  as  their  respective  inverses  act 
as  lower  bounds  to  the  (co)variances  of  the  subject  estimator,  a  property  which  is  referred  to  as  the  Cramer- 
Rao  lower  bound  [8], 

To  more  concretely  motivate  this  assertion,  consider  a  chemical  sensor  array  response,  jU  (9  )  +  5(0), 
where  /l(0)and  5(0)are  the  idealized  sensor  response  vector  and  noise  vector  respectively.  9  denotes  an 
external  parameter  vector  which  is  environmentally  dependent.  For  chemical  sensors  and  sensor  arrays, 
this  is  typically  the  analyte  concentration  vector.  Such  a  sensor  array  response  may  then  be  modeled  with 
a  probability  density  function,  p(X;ju(0))  [5],  as  follows,  ju(0)  =  f  dX  Xp(X\ jU(0)),  with  a  covariance 
matrix  given  by, 

E(0)  =  f  dX(x-p(d))(x-p(d))Tp(x-n(d))  (i) 


A  typical  goal  of  sensor  array  optimization  is  to  minimize  the  global  error  of  the  sensor  array.  This 
quantity  is  captured  by  det(£(0)),  the  determinant  of  the  covariance  matrix.  Since  £(0) is  a  positive  definite 
matrix,  its  determinant  describes  a  strictly  positive  volume  that  may  act  as  a  score  or  metric  for  the  global 
error  [9].  Thus,  from  the  standpoint  of  global  noise,  the  goal  of  the  sensor  array  designer  is  to  minimize  this 
determinant.  Unfortunately,  it  is  often  either  impractical  or  computationally  intensive  to  directly  calculate 
MW  and  £(0)  in  a  way  that  allows  for  the  analytic  optimization  and  design  of  arrays,  particularly  if  a 
complicated  estimator  is  used.  It  is  also  worth  considering  that  many  different  physical  sensor  system 
setups  or  statistical  estimators  may  be  constructed  for  the  same  system  i.e.  sensor  response  probability 
distribution.  This  multitude  of  specific  estimator  possibilities  forces  the  practitioner  to  seek  a  design  criterion 
that  is  robust  in  the  face  of  many  potentially  similar  but  varying  covariance  matrices  or  array  responses. 
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Fortunately,  via  the  Cramer-Rao  inequality,  FI/FIM  provide  a  lower  bound  in  the  positive  definite  sense 
for  the  covariance  matrix  of  such  a  sensor  array  that  is  independent  of  the  actual  estimator  being  used. 
This  provides  a  useful  expression  of  the  fundamental  analytical  potential  of  the  device.  Importantly,  if  the 
practitioner  tunes  or  re-tunes  their  setup,  this  quantity  will  never  change.  Thus,  we  conclude  that  the  FI/FIM 
provides  a  robust  metric  to  optimize  in  the  design  of  chemical  sensor  arrays. 

2.2  Fisher  Information  and  the  Cramer-Rao  Inequality 

Before  showing  how  to  utilize  the  FI/FIM  in  the  context  Further  manipulation  of  the  integrand  yields 
optimization  for  sensor  selection,  it  is  informative  to  first  derive  the  FI/FIM  relation  to  the  Cramer-Rao  lower 
bound.  As  a  prelude,  the  FI  is  defined  as  [10], 


and  each  element  of  the  FIM  itself  is  defined  as, 


with  the  FIM  reducing  to  the  FI  in  the  univariate  case.  Informally,  the  FI/FIM  may  be  thought  of  as  con¬ 
veying  how  much  information  an  observed  random  variable,  x,  or  set  of  random  variables.  X,  carry  about  a 
parameter(s),  0  or  6 . 

In  the  event  those  deterministic  parameters  0  arc  being  statistically  estimated,  the  FI/FIM  provides  a 
lower  bound  to  their  (co)variance  independent  of  the  employed  statistical  estimator(s).  Beginning  with  the 
univariate  case,  the  FI  may  be  derived  [11]  by  first  considering  the  following  expectation  value, 

E[0(x)-0]  =  j  (e(x)-9)p(x;0)=  0  (4) 

where  O(x')  is  an  unbiased  statistical  estimator  for  6.  Next,  differentiating  by  the  deterministic  parameter 
yields, 

j  dx  (0(x)  —  0)p(x:  9)  =  j  dx{0(x)  —  0)-^  —  J  dxp  =0  (5) 

Recognizing  that  since  p  is  a  probability  distribution, 

J  dxp  (x:  0 )  =  1  (6) 

and 

dp  din  (p) 

do  P  do 


(V) 
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which  implies 

J  dx{e{x)-e)p{x\d)^jp- =  1  (8) 

Further  manipulation  of  the  integrand  yields. 


Applying  the  Cauchy-Schwartz  inequality1  to  this  manipulated  integrand  gives, 


After  some  manipulation  of  the  preceding  integrand,  the  expression  resolves  itself  as, 


which  is  the  Cramer-Rao  lower  bound  for  the  univariate  case. 

The  derivation  of  the  Fisher  information  matrix  (FIM)  for  the  multivariate  case  [10]  is  performed  in  a 
similar  fashion  to  the  univariate  case  by  first  considering, 

E[0(X)-0\  =  J  dXp(X |0)(0(Z)-0)  =0  (12) 

And  then  differentiating  this  equation  so  that, 

d9  Jdxpix \e)(d{x)-e)=  JdX(dep(x\e))(e(x)-e)-(ftO)JdXp(x\e)  =  o  (13) 


where  d9  indicates  a  derivative  with  respect  to  the  vector  6.  Rearranging  terms  as  before,  it  becomes, 

f  dXpd6\n{p)(e{X)~d)=  J  rfx((0(X)-0yp)U/pdflln(p))=/  (14) 

1 1  {u,  v)  |2  <  («,  u)  ■  (v,  v)  where  u  and  b  are  vectors  with  the  inner  product  {•,  •} 
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and  applying  the  Cauchy-Schwartz  inequality  gives, 

I<  JdXp^e(X)-e^d(X)-dyy  jdXp  ((,9eln(p))(deln(p))7')  (15) 

so  that  the  FIM  provides  a  lower  bound  to  the  covariance  matrix, 

F(ju;0)-1<£(0)  (16) 

with  the  <  relation  is  in  the  sense  of  a  positive  definite  matrix  and  the  FIM  and  covariance  matrix  (Fand  £ 

respectively)  defined  as, 

F(/i;0)  =  j  dXp  ((deln(p))(deln(p))r)  (17) 

and 

L(6)  =  jdXp  (e(X)-o'j  (fl(X)-ey  (18) 

Clearly,  the  so-derived  FIM  also  implies  the  univariate  case. 

2.3  Derivation  of  a  Lower  Bound  to  the  Fisher  Information  Matrix 

While  the  FI  and  the  FIM  derived  in  the  prior  section  arc  potentially  useful  for  optimizing  a  sensor 
array,  they  nonetheless  require  a  specific  noise  model  for  the  sensor  array  which  may  not  be  forthcoming 
in  practice.  Nonetheless,  it  is  highly  desirable  for  a  practitioner  to  be  able  to  select  relevant  sensors  for  an 
array  in  a  fashion  which  provides  some  reasonable  assurance  of  being  optimal  to  some  degree.  The  following 
lower  bound  to  the  FI  and  FIM  provides  such  an  assurance  while  satisfying  the  need  to  be  a  metric  defined 
by  experimentally  accessible  parameters.  Moreover,  because  it  in  essence  defines  a  FI/FIM  for  a  Gaussian 
model,  it  is  amenable  to  the  convex  optimization  framework  which  will  be  subsequently  developed. 

For  purposes  of  expediency,  first  consider  generic  vector  functions  [12],  h(y)  and  /( jc),  in  the  vector 
expression. 


f[x)=EXy[h{y)f{x)T]Ex[f{x)f{x)T]  xf(x)  (19) 

and  the  positive  semidefinite  matrix  expectation  value, 

£xv[(fc(y) -/(*))  (Mt) -/(*))  ]>o  (20) 

After  expansion  and  rearrangement  of  the  prior  expression,  this  yields  the  following  matrix,  expression 
assuming  a  joint  probability  distribution,  p(x,y), 


Ey[h(Ky)h(y)T]  >  Exy[h(y)f(x)T]Ex[f(x)f(x)T]  lExy[f(x)h(y)T] 


(21) 


6 


Adam  C.  Knapp  et.  al. 


Setting  h(y)  =  dlnpy^y-s'>  and  f(x)  =  (y  —  11(d))  and  then  integrating  appropriately  yields  the  following 
matrix  inequality. 


FGu;0)> 


(dj. 1(g) 
V  <90 


(22) 


which  provides  a  lower  bound  to  the  Fisher  information  matrix  [12], 

It  is  noteworthy  that  this  lower  bound  is  the  Fisher  information  matrix  for  a  system  with  parameter 
independent  Gaussian  noise.  This  result  suggests  two  possible  strategies  for  analyzing  and  optimizing  a 
sensor  array  based  on  the  knowledge  available  regarding  the  noise  characteristics  of  the  sensor.  First,  it 
suggests  that  if  one  only  has  experimentally  derived  sensor  responses  and  covariances  available  for  a  sensor 
system  one  should  initially  optimize  assuming  Gaussian  noise  as  this  represents  a  worst  case  for  such  a 
system  and  is  thus  a  conservative  optimization  strategy.  Conversely,  if  one  has  knowledge  of  a  specific  noise 
model  for  a  chemical  sensor  system  that  is  not  Gaussian  with  constant  noise,  it  suggests  that  it  would  be 
beneficial  to  use  that  model  instead  to  optimize  the  array  since  one  should  always  be  able  to  do  better  than 
a  comparable  Gaussian  system. 


3,  CONVEX  OPTIMIZATION  OF  THE  FISHER  INFORMATION  MATRIX 
3.1  Background  on  Convex  Optimization 

Recall  the  definition  of  the  FIM, 


Due  to  its  structure  as  matrix  defined  by  an  integral  over  an  exterior  vector  product,  the  FIM  is  a  positive 
semidefinite  matrix,  i.e.  a1  Fa  >  0,  where  a  is  an  arbitrary  real-valued  vector  of  appropriate  dimension. 
Positive  semidefiniteness  is  crucial  as  this  property  allows  for  the  so-described  sensor  array  to  be  optimized 
with  respect  to  sensor  configuration  via  convex  optimization  techniques. 

In  order  to  properly  implement  this  idea  for  sensor  array  optimization,  specifications  for  of  an  appropri¬ 
ate  objective  function  as  well  as  a  set  of  constraints  are  required.  To  setup  this  problem,  first  the  objective 
function  will  be  defined  and  then  the  relevant  constraints  detailed.  In  the  process  of  setting  up  the  con¬ 
straints  and  detailing  the  supporting  mathematical  elements  for  the  convex  optimization,  appropriate  sensor 
responses  and  noise  models  for  the  chemical  sensor  array  will  be  proposed. 

Barring  other  priorities  or  specific  knowledge  of  the  analytical  task,  a  reasonable  design  goal  for  a  gen¬ 
eral  purpose  chemical  sensor  array  is  to  minimize  the  global  error  (maximize  the  signal)  of  the  chemical 
sensor  array.  A  useful  measure  for  this  global  error  is  the  volume  of  the  ellipsoid  cast  by  the  covariance  ma¬ 
trix  of  the  relevant  estimators  since  this  provides  a  reasonable  metric  for  the  global  uncertainty  of  estimated 
chemical  concentrations  and  thus  for  the  discernability  of  similar  chemical  mixtures.  The  volume  of  this 
error  ellipsoid  is  given  by  the  following  expression  [9], 


2  7Trf/2 

T(!) 


det(£) 


1/2 


vol(L) 


(24) 
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where  £  is  the  covariance  matrix  and  d  is  the  dimension  of  the  volume.  Minimizing  this  volume  term, 
vo/(£),  ultimately  requires  the  minimization  of  det(£) 1  2  as  all  other  terms  for  the  volume  expression  arc 
related  to  the  system  dimension,  which  is  not  subject  to  optimization. 

Since  the  composition  of  convex  functions  arc  themselves  convex  and  both  the  square  root  function  of 
x  >  0  and  the  determinant  of  a  positive  semidefinite  matrix  like  £  arc  convex  functions  themselves,  the 
objective  function  may  be  further  simplified  todet(E).  For  reasons  of  subsequent  numerical  convenience, 
this  objective  function  is  composed  with  the  natural  logarithm  to  give  ln(det(X))  as  the  final  objective 
function.  It  is  proven  below  that  this  function  is  concave  (convex  up)  for  all  positive  semidefinite  matrices, 
X,  by  showing  that  this  function  satisfies  concavity  [13], 

First  consider  the  following. 


g{t)  =  ln(det(X)) 


=  ln(det(Z  +  fV))  (25) 

where  X  =  Z  +  tV  >  0.  X,  Z,  and  V  arc  positive  definite  matrices  and  t  >  0  is  a  scalar  parameter.  Manipu¬ 
lating  the  matrix  function  in  question  to  ensure  positive  definite  matrices  yields, 

g(t)  =ln(det(Z  +  fV)) 

=  ln(det(Z1/2(/  +  fZ~1/2VZ~1/2)Z1/2)) 


=  ln(det(7  +  tV))  +  ln(det(Z)) 


(26) 


so  that  the  first  and  second  derivatives  of  g(t)  may  be  taken  as, 


g'(t)  =  £ 

i=  1 


A, 

1  +  tX[ 


and 


g"(t) 


r  /li 

SOM)2 


(27) 


(28) 


Since  A,-  >  0  due  to  the  definition  of  positive  definite  matrices,  it  follows  that  gt(t)  >  0  and  gff(t)  <  0  for 
t  >  0.  This  implies  that  ln(det(X))  is  a  convex  function  for  positive  definite  X  [13], 

Having  shown  that  this  objective  function  is  valid  for  the  optimization  problem,  it  is  now  important 
to  consider  what  variables  to  actually  use  to  optimize  the  ln(det(X))  objective.  Specifically,  it  is  not  the 
covariance  matrix  that  is  being  input  into  the  objective  function,  but  the  inverse  Fisher  information  matrix. 
This  substitution  is  justified  due  to  the  Cramer-Rao  lower  bound  (just  as  the  Gaussian  FIM  substitution 
would  be  in  the  case  of  the  upper  bound). 
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Consequently,  it  is  the  FIM  of  a  probability  distribution  and  not  its  covariance  matrix  which  is  parametrized 
for  optimization  and  the  objective  function  thus  becomes, 

ln(det(C(0)))  >  ln(det(F_1(0;s)))  =  — ln(det(F(0;s)))  (29) 


where  s  are  the  slack  variables  subject  to  the  optimization.  For  a  given  convex  optimization  in  addition  to 
the  inequality  and  equality  constraints,  the  practitioner  must  supply  the  convex  optimization  routine  with 
gradient  and  Hessian  routines  for  the  objective  function  in  the  slack  variables  as  well  [13], 

The  gradient  for  —  ln(det(F(0;s))),  the  objective  function,  is  given  by. 


— Vln(det(F(0;s))) 


#M 

-£?,Tr(F 


i—  1 


(30) 


where  #(s)  is  the  cardinality  of  the  slack  variables,  s,  and  e,-  denote  the  relevant  vector  basis  set.  The  matrix 
elements  for  the  Hessian,  h(6;s),  are  defined  by, 


32  92F 


(31) 


To  evaluate  these  quantities  it  is  necessary  to  first  choose  a  noise  model,  so  that  the  FIM  may  be  properly 
parametrized  for  optimization.  This  matter  is  discussed  in  the  following  section. 

3.2  Elliptically  Contoured  Distributions:  A  Correlated  Noise  Model  for  Chemical  Sensor  Arrays 

Elliptically  contoured  distributions  (ECDs)  [14]  are  a  class  of  statistical  model  which  generalize  the  mul¬ 
tivariate  Gaussian  and  includes  many  standard  statistical  models  like  the  multivariate  Students  t-distribution. 
They  are  defined  as  follows 


g((x-n(6)YL(d)(x-n(e))) 

N(0)  {  ’ 

where  g(-)  is  an  arbitrary  univariate  probability  distribution,  9  arc  the  external  deterministic  parameters 
being  bounded  by  the  FIM,  j u(6)  is  the  mean  response  function,  is  a  positive  definite  scale  matrix  which 
reduces  to  the  covariance  matrix  if  g(-)  =  exp(— (•)),  and  N (6)  is  the  normalization  constant  for  the  prob¬ 
ability  density  function.  These  distributions  arc  chosen  to  model  the  correlated  noise  of  chemical  sensor 
arrays  as  they  can  model  correlation  among  sensor  responses  while  remaining  both  analytically  tractable 
and  relatively  general. 

Examples  of  FIMs  for  various  well-known  probability  distributions  arc  given  by  following:  The  FIM  for 
the  multivariate  Gaussian  is  given  by  the  so-called  Slepian-Bangs  formula  as  [15], 


(33) 
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where  £,•  =  and  the  FIM  for  the  multivariate  Student-t  distribution  [15]  is 


Fij(6)=2 


d  +  M 
d  +  M  +  1 


(dp 

\dOj 


1 

d  +  M+l 


Tr  (£“ 1  £; )  Tr  (£“ 1  £y ) 


+ 


d+M 
d  +  M+l 


Ti-fX"1!:, £-%•) 


(34) 


where  d  is  the  degrees  of  freedom,  a  distribution  specific  quantity,  of  the  Student-t  distribution  and  M  is  the 
rank  of  the  scale  matrix  £. 


The  FIM  for  ECDs  [15]  has  been  recently  derived  as  a  generalization  of  the  Slepian-Bangs  formula  as 


Fij(d)  =2 


Ep[q<p2(q)\  ( d^l1 

M  I  30, 


’-t, 


d  /I 

dQ: 


+ 


Ep[q2f-(q)] 
_  M(M  + 1) 


-  1 


+ 


Ep[q2f-(q)} 
M(M+  1) 


Tr(£_1£,£_1£;-) 


(35) 


where  M  is  the  scalar  dimensionality  or  rank  of  the  scale  matrix,  £  G  Ep  [•]  denotes  an  expectation 

value  with  regard  to  a  probability  density. 


p(q)  =  +—qM  ls(q) 

OM,g 


(36) 


so  that, 

1  r°° 

Epl]  =  + — /  dq(-)qM~ls{q) 

OM,g  JO 

where 

poo 

SM,g=  /  dttM~lg(t ) 

Jo 

and 


0(0 


gOO 

g(0 


(37) 


(38) 


(39) 


Using  ECDs  and  their  coiTesponding  FIMs  as  reasonable  models  for  correlated  chemical  sensor  arrays 
allows  the  practitioner  to  propose  a  specific  model  for  convex  optimization. 
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3.3  Gradients  and  Hessians  for  the  Fisher  Information  Matrices  of  Elliptically  Contoured  Distribu¬ 
tions 


Recall  from  the  prior  section  that  the  model  dependent  components  of  the  gradient  and  Hessian  matrix 
for  the  convex  optimization  of  the  FIM  arc  the  matrices  If- and  jppjpp-  The  expressions  for  the  matrix 
elements  of  these  matrices  arc  developed  in  the  following  subsection. 

First,  express  the  ECD  FIM  elements  as  follows, 


FECo(iJ)  —  cc  ^*y£ 


l  dfl 

dOj 


+/3  Tr(£  £,)Tr(£ 


_1£y)  +7Tr(£-1£;£-1£y) 


K 


(40) 


where  a,  /i.  and  /  are  distribution  specific  constants  that  arc  not  dependent  upon  the  slack  variables,  sp  and 
sq,  and  arc  given  by. 


„  2  Ep[q<i>2{q)\ 

M 

(41) 

_  Ep[q2Q2(q)\ 

P  M(M  —  1) 

(42) 

Ep[q2*2(q)\ 
r  M(M  +  1 ) 

(43) 

where  the  subscript  p  denotes  an  average  with  respect  to  p(q)  and  £,•  =  and  G,  J,  and  K  are  so  defined 
to  simplify  the  derivation  and  presentation. 


The  derivatives  of  each  of  these  sub-expressions  are  given  as  follows, 


dG  d2HT  _ dnT_:_  djUr__t  d2\L 


32 “T  ,  dll  dll7  .  „ 

— 1  A*  ,  y*  —  1  y-I  y 

dsp  dspd0i  dOj  dOj  Sp  dOj  d6j  dspd0j 


(44) 


=  Tr(L-%L-%  +  L-%pi)Tr(L-%)  +Tr(£-1£/)Tr(£-1£Sp£-‘£i  +Z~%pj)  (45) 


—  =Tr(£-1£S;£-1£,£-I£7+£-1£V;,/£-1£i  +  £-1£,£-1£,;£-1£i  +  £-1£,£-1£,w) 


(46) 


where 


and  £,  , 


d2!. 

dspdOi  * 
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The  Hessian  elements  for  the  G  and  J  terms  are  given  by, 

d2G  _  d3nT  t  djl  d2nr  ,  y  djl  d2yr  !  d2n  d2ilT  t  d2^ 
dspdsq  dspdsqdQj  dOj  dspdQj  Sq  dOj  dspc)6j  dsqdOj  dsqdQj  dspdOj 

,  V  s-1  d2/*  d2nr  t 

r}0,  dsqdspdd,  dsqde,  Sp  dOj 


d/lT  _ 

+  -4—  £ 


dnT 

-| — —  £ 

ae,-  aft 


*£s  .  £ 


dll  dnT 

ae; +  ae, 


-i 


£..  I"1!,.  £ 


aju 

aey 


+ 


dnT 

dOj 


£_1£s  £ 


a2ju 

dsqdOj 


(47) 


a2/ 

dspdsq 


=  Tr(£~1£;)Tr(£_1£.<£_1£./£“1£(-  +  £_1£v  £_1£j  +  £^1£,p£"1£^£”1£,) 


+  Tr(E_1£_;')Tr(E_1Eisp£_1£is?i  +  + 1’1^, 

TTr^-1!,,  £-!£,  +  £-1£,p0Tr(I-1Iv£-1£i  +  £-1Iv-) 


+  Tr{£- 1  £v„£  1  £,  +  £" 1  £vjTrf£  1  £  £" 1  £;  +  £~ 1  £ 


V' 


+  Tr(£  1  £,)Tr (£" 1  £A  £  1  £ip£“ 1  £,■  +  £" 1  £v  £" 1  £;  +  £  1  £  ,  £” 1  £s  £  1  £, 


+  Tr(I-1£,OTr(I-1IvI-1I^  +  I-1ISp£-1£,;)i  +  £-1£w; 


(48) 


and  the  Hessian  element  for  K  is  setup  as  follows, 


r)2K  r)  a 

,  ,  =  ,  Tr(£  '£,,£-'£,£  %)+-  Tr(£  ^-E"1^) 

OSpOSq  OSq  s,  v  uSq  ^  v  - 


Ai 

+  Tr (£  -  ■ 1  £,-£- 1  £,  £~  '£;)  +  /  Tr (£" 1  £,-£" 1  £ 

USq  S  C/Sa\mm 


A2 

-lv  V’-l'l 


a3 


a4 


(49) 
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where  the  derivatives  of  each  of  the  sub-terms  of  the  expression  are  given  by  the  following: 


dA{ 

dsq 


=  Tr(E-1EJ,E-1E,pE-1£I-L"1Ei/  +  r_1rv,E_1r/E-I£i/  +  r-1EJ/>E-1E,4E-1EiE-1Ei/) 
+  Tr(E-1E,pE-1E,?,X-1Ei  +  E-1E,pi:-1EiE-1ESpE-1E;+i:-1ES;,i:-1E/E-1E^;-) 


(50) 


dA2 

dsq 


Tr(E-1EsE-1E,,.E-1E7.+E-1EwE-1Ei  +  E-1E,(.E-1EsE-1Ei  +  E-1E,,.E-1Ew) 


(51) 


dAj 

dsq 


dA$ 

dsq 


=  Tr(E  1  EVf/E  1  E,E  1 ESpE  1 E j  +  E  %piZ  'E.vE  1  E;  +  E  1  E;E-IE.V(/E~ 1  E.VpE  1 E, ) 
+  Tr(E-1E,E-1EVpE-1Ei  +  E-1E,E-1Ei;,E-1E,E-1Ei  +  E-1EiE-1ESpE-1Ev) 


=  Tr(E" 1  Es  E” 1  E#E" 1 E,  ,•  +  E  1  E,,E  1 E.  j  +  E" 1  E,E  1 E,  E" 1  LSpj  +  E" 1  E,E” 1 E 


"SpSgj) 


(52) 

(53) 


where  Zab  =  ^  and  Lahc  = 

3.4  Defining  the  Mean  Response  Vector,  ECD  Scale  Matrix,  Slack  Variables  and  their  Constraints 
for  Convex  Optimization 

This  leaves  two  system  specific  quantities,  JU(0),  the  sensor  response  vector,  and  E(0),  the  scale  matrix, 
still  to  be  defined  for  the  convex  optimization  as  well  as  the  slack  variables  themselves  for  the  optimization. 
Setting  up  both  of  these  quantities  to  both  remain  true  to  the  goal  of  sensor  selection  while  defining  a 
properly  positive  definite  argument  as  required  by  the  definition  of  an  ECD  requires  careful  consideration. 

Among  the  first  things  to  define  are  how  many  slack  variables  are  required  for  this  problem  and  thus 
what  their  uses  and  constraints  might  be.  Since  the  goal  is  to  select  sensors  to  fill  an  array  from  sensor 
choices  and  because  any  sensor  can  fill  any  slot,  nm,  variables  are  needed  so  that  any  sensor  can  be  put  in 
any  slot  of  the  array. 

This  decision  for  the  sensors  yields  the  following  constraints. 


m 

V/:0<£  sij  <  1  (54) 

j=  i 

or 

V»: ||  St\\2  <  1  (55) 

since  each  sensor  can  only  be  used  at  most  once  in  the  sensor  array. 


v; :  I>y  =  1 


i=  1 


(56) 
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because  each  sensor  slot  must  be  filled  where  the  subscripts  i  and  j  denote  the  placement  of  a  sensor  i  in  an 
array  slot  j. 

The  slack  variables  themselves  arc  also  used  in  the  context  of  the  ECDs  since  sensor  selection  implicitly 
redefines  the  underlying  noise  model.  Supporting  this  observation  is  the  parametrized  mean  response  vector 
jU  as  well  as  the  gradient  and  Hessian, 


i—  1 


M(0) 


m 


1 >/*;(«) 

7=1 


<9/1  _  „  dflj 

dsij  '  c)s a 


ej  /./, 


<92/l 

dspdsq 


=  0 


(57) 


(58) 


(59) 


(60) 


The  ECD  specific  scale  matrix,  £(0;s),  as  well  as  its  gradient  and  Hessian  terms  arc  parametrized  and 
constructed  in  analogous  way  to  the  response  vector  jU  as  follows, 

£(0;s)  =  a(d;s)(g>a(d;s)T  (61) 


a(0;s) 

da 

dSij 

d2a 

dspdsq 


=  £s,7Oi(0) 


=  'L^J(7J(6'S) 


7=1 


,  dGj 

=  €jd^- 

OS,] 


=  ejO; 


=  0 


(62) 


(63) 


(64) 


(65) 


<91 

dSij 


da 

dSij 


®a{0\s)T  +  a(0\s)® 


daT 

dSij 


(66) 
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d2L 

dspdsq 


d2a  ..  <9<r 

-  <7(0;s)r  +  — 


dspdsq 


daT  da  da1 

+  3 — ®  3 —  +ff(0;s) 


dcr 


_  da7  5<t 

dsq  dsq 


dsp  dsa  dsa  dsn 

da 

ds„ 


d2aT 

dspdsq 


(67) 


for  the  scale  matrix  E,  where  ®  denotes  the  outer  product. 

3.5  Applying  Convex  Optimization:  Interpreting  the  Results 

Before  running  the  convex  optimization,  a  test  point  for  the  optimization  needs  to  be  found  which  sat¬ 
isfies  all  of  the  constraint  values.  This  point  implicitly  defines  the  subset  region  of  the  positive  definite 
matrices  available  for  the  optimization.  This  is  necessary  as  there  is  an  inherent  degeneracy  in  assigning 
sensors  to  slots  since  in  an  optimized  setting  one  could  permute  sensors  with  slots  and  have  the  same  an¬ 
swer.  However,  in  order  continuously  reach  another  region  where  the  sensors  and  slots  arc  permuted,  one 
would  have  to  travel  through  an  area  where  the  FIM  becomes  singular  by  continuity.  Consequently  each 
point  exists  in  a  subregion  defined  by  these  singular  bounds,  which  incidentally  have  an  infinite  determinant 
since  the  objective  function  is  the  determinant  of  the  inverse  FIM.  Since  all  of  these  regions  arc  identical  by 
relabeling,  optimizing  in  one  subregion  is  as  good  as  optimizing  in  any  of  the  others.  Due  to  the  now  de¬ 
fined  subregion  being  described  by  positive  definite  matrices  this  remains  a  problem  in  convex  optimization. 
Thus,  by  selecting  a  specific  starting  point  one  breaks  the  permutation  symmetry  of  this  problem  while  still 
allowing  for  the  usage  of  convex  optimization  in  sensor  selection. 

On  a  more  practical  note,  a  unique  starting  point  which  satisfies  this  problems  constraints  may  be  defined 
as  follows: 


Consider  the  sensor  collection  ordered  in  a  list.  Take  the  first  m  sensors  and  assign  each  one 
to  a  unique  array  slot  j.  Set  the  corresponding  slack  variables  for  this  sensor-slot  selection  equal, 
stj,  to  1.  Set  all  other  slack  variables  equal  to  0. 


This  defines  a  unique  starting  point  for  the  optimization  which  obeys  the  system  constraints. 

After  the  appropriate  convex  optimization  has  been  performed,  a  vector  corresponding  to  numeric  values 
for  the  slack  variable  vector  s  will  be  output.  Assuming  that  the  optimization  has  preceded  correctly,  the 
vector  should  have  m  slack  variables  close  to  1  (Otherwise  sort  the  slack  variable  vector  in  numeric  order 
and  choose  the  top  m).  Round  these  up  to  1  and  all  other  variables  down  to  0.  This  resulting  vector  tells  the 
practitioner  which  sensors  have  been  selected  for  which  slots  by  whether  or  not  a  slack  variable  is  1  or  0. 
If  it  is  1  then  the  corresponding  sensor  and  slot  has  been  selected;  otherwise  it  has  not.  This  completes  the 
sensor  selection  process. 


4.  CONCLUSIONS 

This  memo  report  has  considered  sensor  selection  for  a  nonspecific  chemical  array  under  the  influence 
of  correlated  noise.  It  has  used  global  error  minimization  or  conversely  signal  maximization  as  a  criteria  for 
optimization  by  considering  the  determinant  of  the  covariance  matrix  as  idealized  by  the  Fisher  information 
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matrix  as  a  scalar  criterion  for  this  optimization.  Using  these  definitions  as  well  as  the  mathematical  prop¬ 
erties  of  this  underlying  matrix,  it  has  been  able  to  set  up  this  optimization  problem  in  the  context  of  convex 
optimization  and  has  developed  this  scenario  along  with  the  supporting  mathematics  for  this  methodology. 

It  has  presented  two  distinct  approaches  to  this  optimization  problem.  First  it  has  considered  and  pre¬ 
sented  the  optimization  methodology  for  a  family  of  solved  non-trivial  correlated  noise  models,  the  ellipti- 
cally  contoured  distributions,  which  include  many  standard  distributions  such  as  the  multivariate  Gaussian 
and  Student-t  distributions.  It  has  also  taken  a  more  practical  approach  and  considered  a  lower  bound  to  the 
Fisher  information  matrix  which  would  allow  a  working  practitioner  to  select  sensors  with  only  knowledge 
of  a  correlation  matrix  and  the  sensor  response  using  the  same  framework  and  methodology.  While  the 
later  of  these  two  approaches  would  not  necessarily  be  optimal  it  could  be  significantly  better  than  what  a 
practitioner  might  develop  through  trial  and  error. 
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